PWN Your Infrastructure:
Behind
: World at War
Jason LaPorte (jason@agoragames.com)
Agora Games (http://www.agoragames.com/)
What does Agora Games do?
We make video games awesome.
(And build community-driven websites.)
What does Agora Games do?

What does Agora Games do?

What does Agora Games do?

What does Agora Games do?

What does Agora Games do?

Who am I?
Infrastructure
- Web Server (NGINX)
- Load Balancer (HAProxy)
- Application Stack (Thin, Rails)
- Database (MySQL)
- Operating System (Ubuntu Linux)
- Network (Firewalls, NFS)
- Tools
I'm going to focus on the system side.
(For the application-level stuff, check out
our Guitar Hero talk later today!)
What's wrong with a
typical Rails deployment?
Scalability!
Scalability!
(Of the administrator's time.)
Scalability!
Ideally:

Scalability!
In the real world:

Scalability!
(If you're me...)

One server is trivial.

Three servers is easy.

Twelve servers is tricky.

Fifty servers is a time sink.

What starts to fail?
- System updates/fixes take forever.
- Hardware errors become frequent.
- Transient network failures occur more often.
- Unpredictable failures become common.
- Design shortcomings become apparent.
- etc.
Capistrano would help, but...
- It's synchronous.
- Failures aren't localized.
- Ruby isn't a clean way to specify
system tasks anyway.
We also need automation and
centralized monitoring!
There's a lot that
needs fixing here.
- Failures must be designed around.
- Repetitive tasks must be abstracted away.
- Monitoring information needs
to be made accessible.
- Deploys need to be centralized
and simplified.
Design Goals
- KISS
- When in Rome...
- Teach a man to fish...
Designing Around Failures
- Virtualization for hardware problems:
Terremark (http://www.terremark.com/).
- Replication for software problems.
(Virtualization also gives us
a lot of flexibility!)
Abstracting Repetition
/usr/local is propagated via NFS.
(Updating code is quick and painless.)
(Well, almost painless.)
Abstracting Repetition
What about configuring a myriad of servers?
Abstracting Repetition
Well, we did what you usually do when
you have too many units to manage...
...we spawned more overlords.

Overlord 
A (very) simple Rails app that does two things:
- Centralizes configuration.
- Aggregates monitoring information.
(Sorry, it's currently proprietary.)
Overlord
OVERLORD=overlord.example.com
HOSTNAME=`hostname`
CONFIG_URL=http://$OVERLORD/hosts/config/$HOSTNAME
curl -s $CONFIG_URL >/tmp/autoconfig.sh
/bin/sh /tmp/autoconfig.sh
Overlord
Two models. A Host has_many Configurations.
Overlord

Overlord
Each Configuration represents
a file on the host.
Overlord

Overlord
Monit does the rest.
Monit
http://mmonit.com/monit/

Monit

Monit
We rely on it for just about everything.
- System monitoring.
- Starting daemons.
- Ensuring liveness.
- Email alerts.
- etc.
Overlord
We pull XML from Monit, and feed it
into RRDTool for graphing.
http://.../_status?format=xml
RRDTool
http://oss.oetiker.ch/rrdtool/

RRDTool

Monit + RRDTool
- Nagios (clunky)
- ZABBIX (ditto)
- MMonit (see above)
- Cacti (great, but limited)
- Munin (wreaks havoc on system resources)
Centralizing Deploys
Deploying is really a three step process:
- Update your code.
- Update your environment.
- Restart your servers.
Centralizing Deploys
Code has to be propagated to all app servers.
We used to use Capistrano for this.
Now we're using NFS.
#!/bin/sh
# <set up variables here>
# deploy
svn -q export $REPOSITORY $NEW_RELEASE_DIR
chmod -R g+w $NEW_RELEASE_DIR
# symlink
rm -f $CURRENT_DIR
ln -s $NEW_RELEASE_DIR $CURRENT_DIR
ln -s $SHARED_DIR/log $CURRENT_DIR/log
ln -s $SHARED_DIR/pids $CURRENT_DIR/tmp/pids
# migrate
cd $CURRENT_DIR && rake db:migrate RAILS_ENV=production
# restart
touch $SHARED_DIR/pids/restart.touch
Deploying

Wrapping Up
- Virtualization.
- Mirrored filesystem (NFS).
- Monit + RRDTool = Overlord.

- Good old-fashioned scripting.
Favoring existing conventions saves time.
Where next?
- Better abstractions for Overlord.
- Automagic zero-downtime
configuration updates.
- Centralized logging, alerting
(integrated with RRDTool).
- A better NFS...?